Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning
نویسندگان
چکیده
In reinforcement learning (RL), the ability to utilize prior knowledge from previously solved tasks can allow agents quickly solve new problems. some cases, these problems may be approximately by composing solutions of primitive (task composition). Otherwise, used adjust reward function for a problem, in way that leaves optimal policy unchanged but enables quicker (reward shaping). this work, we develop general framework shaping and task composition entropy-regularized RL. To do so, derive an exact relation connecting soft value functions two RL with different dynamics. We show how derived leads result then generalize approach multiple validate theoretical contributions experiments showing lead faster various settings.
منابع مشابه
Reward Shaping in Episodic Reinforcement Learning
Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of reinforcement learning in various sectors, such as healthcare and cyber-security, among others. However, reinforcement learning can be time-consuming be...
متن کاملPotential Based Reward Shaping for Hierarchical Reinforcement Learning
Hierarchical Reinforcement Learning (HRL) outperforms many ‘flat’ Reinforcement Learning (RL) algorithms in some application domains. However, HRL may need longer time to obtain the optimal policy because of its large action space. Potential Based Reward Shaping (PBRS) has been widely used to incorporate heuristics into flat RL algorithms so as to reduce their exploration. In this paper, we inv...
متن کاملAbstract MDP Reward Shaping for Multi-Agent Reinforcement Learning
MDP Reward Shaping for Multi-Agent Reinforcement Learning Kyriakos Efthymiadis, Sam Devlin and Daniel Kudenko Department of Computer Science, The University of York, UK Abstract. Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. As attention is shifting from tabula-rasa approaches to methods where some heuristic domain knowledge can be give...
متن کاملPlan-based reward shaping for multi-agent reinforcement learning
Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function. Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learnin...
متن کاملReward Shaping for Model-Based Bayesian Reinforcement Learning
Bayesian reinforcement learning (BRL) provides a formal framework for optimal exploration-exploitation tradeoff in reinforcement learning. Unfortunately, it is generally intractable to find the Bayes-optimal behavior except for restricted cases. As a consequence, many BRL algorithms, model-based approaches in particular, rely on approximated models or real-time search methods. In this paper, we...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i6.25817